Rank | Count | Beginning |
---|---|---|
19409 | 1507 | या |
29004 | 570 | हे |
28015 | 460 | हा |
28631 | 338 | ही |
29650 | 251 | ह्या |
12269 | 248 | त्यांनी |
9839 | 247 | ते |
11250 | 223 | त्यामुळे |
22394 | 218 | येथे |
10424 | 211 | त्या |
2875 | 195 | आणि |
9225 | 164 | तसेच |
11989 | 153 | त्यांच्या |
14227 | 151 | पण |
23810 | 133 | व |
11003 | 132 | त्यानंतर |
22052 | 125 | यांनी |
14517 | 124 | परंतु |
4428 | 117 | एक |
2151 | 114 | अशा |
10283 | 110 | तो |
5692 | 106 | काही |
11882 | 106 | त्यांचे |
21862 | 98 | यांच्या |
15204 | 97 | पुढे |
22265 | 93 | येथील |
3829 | 92 | इ.स. |
26315 | 86 | सर्वात |
20191 | 84 | यात |
6872 | 83 | गावात |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV